AITopics | test data

TTRL: Test-Time Reinforcement Learning

Neural Information Processing SystemsJun-14-2026, 02:36:47 GMT

This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference while not having access to ground-truth information. While this setting appears elusive, we find that common practices in Test-Time Scaling (TTS), such as majority voting, yield surprisingly effective rewards suitable for driving RL training. In this work, we introduce Test-Time Reinforcement Learning (TTRL), a novel method for training LLMs using RL on unlabeled data. TTRL enables self-evolution of LLMs by utilizing the priors in the pre-trained models. Our experiments demonstrate that TTRL consistently improves performance across a variety of tasks and models. Notably, TTRL boosts the pass@1 performance of Qwen-2.5-Math-7B by approximately 211% on the AIME 2024 with only unlabeled test data. Furthermore, although TTRL is only supervised by the Maj@N metric, TTRL has demonstrated performance to consistently surpass the upper limit of the initial model, and approach the performance of models trained directly on test data with ground-truth labels. Our experimental findings validate the general effectiveness of TTRL across various tasks and highlight TTRL's potential for broader tasks and domains.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

cdd0640218a27e9e2c0e52e324e25db0-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 19:47:50 GMT

machine learning, natural language, swapprompt, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

adc98a266f45005c403b8311ca7e8bd7-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 10:08:28 GMT

artificial intelligence, machine learning, probability, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

481fbfa59da2581098e841b7afc122f1-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 17:15:13 GMT

The code for our experiments is available at https://github.com/AndyShih12/HyperSPN. To examine the merits of HyperSPNs as discussed in Section 3, we construct a hand-crafted dataset to test the three types of models described in Figure 4: SPN-Large, SPN-Small, and HyperSPN. The hand-crafted dataset is procedurally generated with 256 binary variables and 10000 instances, broken into train/valid/test splits at 70/10/20%. The generation procedure is designed such that the correlation between variable i and j is dependent on the path length between leaves i and j of a complete binary tree over the 256 variables. The exact details can be found in our code.

artificial intelligence, hyperspn, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

Equal Opportunity of Coverage in Fair Regression

Neural Information Processing SystemsApr-25-2026, 09:21:29 GMT

We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making. The seminal work of "equalized coverage" proposed an uncertainty-aware fairness notion. However, it does not guarantee equal coverage rates across more fine-grained groups (e.g., low-income females) conditioning on the true label and is biased in the assessment of uncertainty. To tackle these limitations, we propose a new uncertainty-aware fairness - Equal Opportunity of Coverage (EOC) - that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level. Further, the prediction intervals should be narrow to be informative. We propose Binned Fair Quantile Regression (BFQR), a distribution-free post-processing method to improve EOC with reasonable width for any trained ML models. It first calibrates a hold-out set to bound deviation from EOC, then leverages conformal prediction to maintain EOC on a test set, meanwhile optimizing prediction interval width. Experimental results demonstrate the effectiveness of our method in improving EOC.

artificial intelligence, coverage rate, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.15)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

2857242c9e97de339ce642e75b15ff24-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 04:21:21 GMT

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

05fb0f4e645cad23e0ab59d6b9901428-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 08:55:59 GMT

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science (0.68)

Add feedback

RAAGBl Wh25-3535-5050-6565-80Acc 2 s21 s63 s74 s54 s298 s685 s660 s40% 0�mpaaaacmpmpmpmpiaaaECEtkmpmpmpsleeEtllllseeeeilllmsssseeesss ate MAE vs Oracle

Neural Information Processing SystemsApr-24-2026, 08:55:55 GMT

Evaluating the performance of machine learning models on diverse and underrepresented subgroups is essential for ensuring fairness and reliability in real-world applications. However, accurately assessing model performance becomes challenging due to two main issues: (1) a scarcity of test data, especially for small subgroups, and (2) possible distributional shifts in the model's deployment setting, which may not align with the available test data. In this work, we introduce 3STesting, a deep generative modeling framework to facilitate model evaluation by generating synthetic test sets for small subgroups and simulating distributional shifts. Our experiments demonstrate that 3STesting outperforms traditional baselines--including real test data alone--in estimating model performance on minority subgroups and under plausible distributional shifts. In addition, 3S offers intervals around its performance estimates, exhibiting superior coverage of the ground truth compared to existing approaches. Overall, these results raise the question of whether we need a paradigm shift away from limited real test data towards synthetic test data.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)

Add feedback

Is Heterogeneity Notorious? Taming Heterogeneity to Handle Test-Time Shift in Federated Learning

Neural Information Processing SystemsMar-20-2026, 05:03:21 GMT

Federated learning (FL) is an effective machine learning paradigm where multiple clients can train models based on heterogeneous data in a decentralized manner without accessing their private data. However, existing FL systems undergo performance deterioration due to feature-level test-time shifts, which are well investigated in centralized settings but rarely studied in FL. The common non-IID issue in FL usually refers to inter-client heterogeneity during training phase, while the test-time shift refers to the intra-client heterogeneity during test phase. Although the former is always deemed to be notorious for FL, there is still a wealth of useful information delivered by heterogeneous data sources, which may potentially help alleviate the latter issue. To explore the possibility of using inter-client heterogeneity in handling intra-client heterogeneity, we firstly propose a contrastive learning-based FL framework, namely FedICON, to capture invariant knowledge among heterogeneous clients and consistently tune the model to adapt to test data. In FedICON, each client performs sample-wise supervised contrastive learning during the local training phase, which enhances sample-wise invariance encoding ability. Through global aggregation, the invariance extraction ability can be mutually boosted among inter-client heterogeneity. During the test phase, our test-time adaptation procedure leverages unsupervised contrastive learning to guide the model to smoothly generalize to test data under intra-client heterogeneity. Extensive experiments validate the effectiveness of the proposed FedICON in taming heterogeneity to handle test-time shift problems.

artificial intelligence, heterogeneity, machine learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Hand-Crafted Example

Neural Information Processing SystemsFeb-19-2026, 01:29:52 GMT

The code for our experiments is available at https://github.com/AndyShih12/HyperSPN. To examine the merits of HyperSPNs as discussed in Section 3, we construct a hand-crafted dataset to test the three types of models described in Figure 4: SPN-Large, SPN-Small, and HyperSPN. The hand-crafted dataset is procedurally generated with 256 binary variables and 10000 instances, broken into train/valid/test splits at 70/10/20%. The generation procedure is designed such that the correlation between variable i and j is dependent on the path length between leaves i and j of a complete binary tree over the 256 variables. The exact details can be found in our code.

artificial intelligence, hyperspn, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

Filters

Collaborating Authors

test data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

TTRL: Test-Time Reinforcement Learning

cdd0640218a27e9e2c0e52e324e25db0-Paper-Conference.pdf

adc98a266f45005c403b8311ca7e8bd7-Supplemental-Conference.pdf

481fbfa59da2581098e841b7afc122f1-Supplemental.pdf

Equal Opportunity of Coverage in Fair Regression

2857242c9e97de339ce642e75b15ff24-Supplemental-Conference.pdf

05fb0f4e645cad23e0ab59d6b9901428-Supplemental-Conference.pdf

RAAGBl Wh25-3535-5050-6565-80Acc 2 s21 s63 s74 s54 s298 s685 s660 s40% 0�mpaaaacmpmpmpmpiaaaECEtkmpmpmpsleeEtllllseeeeilllmsssseeesss ate MAE vs Oracle

Is Heterogeneity Notorious? Taming Heterogeneity to Handle Test-Time Shift in Federated Learning

A Hand-Crafted Example